Centralized Content-Based Web Filtering and Blocking: How Far Can It Go?
نویسندگان
چکیده
To an organisation, centralized Internet filtering and blocking is very important for a couple of reasons. With the flooding of pornographic materials on the Web, educators and parents would like to block these offensive materials from their children. Companies also want to reduce the amount of work time that its employees spend on non-productive Web surfing. Current blocking and filtering mechanisms can roughly be classified into two approaches: URL based and content filtering. In the URL based approach, a requested URL address will be blocked if a match is found in the blocked list. However, keeping the list up-to-date is very difficult. New sites are kept uploading onto the Internet daily; many blocked sites try to use multiple IPS and domain names; the sites might also be moved regularly. In the content filtering approach, keyword matching is often used. Its main problem is the mis-blocking. Many desirable Web sites are blocked because some predefined keywords appear in their Web pages, though in different meaning or context. There are even suggestions for image, audio, and video understanding in real-time content filtering. Of course, the delay time as well as the mis-match between the HTTP streaming protocol and the complexity of the filtering algorithm will be of great concern. In this paper, we investigate how far the multimedia content analysis should go for Internet filtering and blocking. A set of guidelines for defining the heuristics used in the real-time Web content analysis is also given. These heuristics not only have higher filtering accuracy than most multimedia retrieval techniques do, but they also have comparable runtime overhead as the keyword matching does. Our one-year experience of deploying a pornographic filtering system in high schools will also be described. Experience from the system implementation and deployment is found to give a very good direction on the centralized filtering and blocking of Web content.
منابع مشابه
Named Entity Recognition for Web Content Filtering
Effective Web content filtering is a necessity in educational and workplace environments, but current approaches are far from perfect. We discuss a model for text-based intelligent Web content filtering, in which shallow linguistic analysis plays a key role. In order to demonstrate how this model can be realized, we have developed a lexical Named Entity Recognition system, and used it to improv...
متن کاملA personalized web page content filtering model based on segmentation
In the view of massive content explosion in World Wide Web through diverse sources, it has become mandatory to have content filtering tools. The filtering of contents of the web pages holds greater significance in cases of access by minor-age people. The traditional web page blocking systems goes by the Boolean methodology of either displaying the full page or blocking it completely. With the i...
متن کاملA Comparative Study of Regulating the Filtering of Cyberspace in the US, the EU and China; Proposals for Policymaking in Iran
The crucial role of cyberspace attracted the special attention of the governments in different countries, which consider it both as a challenge and an opportunity. One of the key policies and preventive measures adopted concerning the challenges posed by the cyberspace is it regulation. In fact, there are only a few states have not taken any steps in regulating their cyberspace. This paper seek...
متن کاملDemystifying Internet Content Filtering for Businesses, Schools, and Libraries
Why it's needed, how it works, and solutions from SonicWALL ® CONTENTS Importance of content filtering – For businesses – For schools and libraries 2 How content filtering works – Site blocking versus content monitoring – Solution architectures 4 SonicWALL content filtering solutions – SonicWALL Content Filtering Service – SonicWALL Content Security Manager 2100 Content Filter – Features and be...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004